Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] [stream] Add resource refcounting analysis pass #20075

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

drprajap
Copy link

@drprajap drprajap commented Feb 24, 2025

basic refconting based on map, finds last user and adds dealloca
changes with add/dec ref and lastuse ops
TODOs

  • refine IR test and add checks
  • dealloca is conditionally inserted via scf.if, when does the condition left to fold during runtime, which case would hit this?
  • currently this pass mostly would work for transient allocation, need to understand more for other casese like IPO/globals/external ABI lifetime ownership transfers

- basic refconting based on map, finds last user and adds dealloca

TODOs
- use ResourceLastUseOp and lower it to dealloca in, can be moved to StreamOpFolders
- dealloca is conditionally inserted via scf.if, when does the condition left to fold
  during runtime, which case would hit this?
- currently this pass mostly would work for transient allocation, need to understand more
  for other casese like IPO/globals/external ABI lifetime ownership transfers
Copy link
Collaborator

@benvanik benvanik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Darn, we should have really chatted about this - I've been working on this and had started with a similar solution but it does not end up working out (last use analysis is insufficient, and interop across program boundaries is required to make this work with custom modules and hosting applications that also need to track deallocations).

As a meta comment, we should start with some smaller PRs - there's a lot of cleanup to do in here and it's easier to learn the style/conventions by making smaller more frequent PRs than large ones that needs a lot of rounds of feedback. Patterns and folders, improvements to existing tests, fixes to existing passes, etc are all better to start on than a new pass.

@drprajap
Copy link
Author

Oh ok, that is why the runtime hang and the sharded program never ran to completion with just this last use analysis, I was mostly tracking and verifying things from IR level that before placing deallocas, they have proper synchronization points from its users,

when you say hosting application, is this when input data gets copied for GPU usage from application/outside program and we use alloca/transient buffers for them and they should be tracking from hosting application/program ? can you elaborate the use-case?

yes, I agree that this PR looks out of place and time should have spent to clean it up before sending your way, I was racing to make it to see if this analysis work would help in sharded oom issue, and when I saw memory usage dropped significantly I became hopeful :) but I guess the synchronization beyond program was the part I wasn't paying much attention to.

do you have any reference I can take a look for your work and taking this PR further, or we can purge this PR?

@benvanik
Copy link
Collaborator

Yeah, unfortunately when it comes to full programs we need a combination of runtime and compiler behavior - we can still use analysis to clean up things when knowable at compile time but there will be cases where it fails and we'll have to track at runtime. I toyed with keeping things in the compiler but with a shared ref count (so like your PR but with a pointer-to-an-integer instead of integer ref count) but there's a lot of cases where that ends up leaking out (if a function returns 2 buffers to the calling code they may be the same buffer, different buffers, or subviews of the same buffer and deallocation order is critical) - it ends up ballooning into full alias analysis. We also need to be able to adopt/take over buffers provided by the callers and make sure things round trip - if we return an allocation and the user then passes it right back in as an argument we want to be able to deallocate it (or reuse it), etc.

No worries, and I'm glad you sent it out early :) We can coordinate on discord (#core-development is a good channel) if you're looking to take on larger work to make sure no one else is actively looking at it. Andrew is looking at improvements for ElideTimepoints right now and there are definitely other passes that could use some love. The trick is to look at IR and try to find things that should not be there: it's a great way to learn about the IR, the kind of programs we're (currently) compiling, and make sure we are implementing things as simply as possible (a cleanup pattern or two in an op canonicalizer can sometimes replace the need for an entire pass, or take the load off of a pass so it can be simpler by assuming the IR is in a canonical state). There's been a lot of recent improvements on integer range analysis that should let us dramatically improve the LayoutSlices pass on dynamic models, for example, as today it only ever matches dynamic shapes exactly instead of say knowing that allocation C is 2x allocation A and B such that A and B can be packed into C's storage - if our dynamic models show that having potential benefit it'd be a really good place to dig into.

As for this, I'm still iterating on the design - I basically hit the end of what the compiler-only approach would do on Friday and am going to take another stab at it now that shortfin has been mostly wired up to do the right thing (in the wrong way, because there's no right way currently :) and we have a use case.

@drprajap
Copy link
Author

Sounds good. Your suggestions are great, will spend some time to go over those. This pass was good kick-off to get hands-on for core iree development areas and getting familiarity, turned out to be a little bit involved :D..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants